On the Estimation of Treatment Effect with Text Covariates

发布于 2020-02-22 更新于 2020-02-25 causal

On the Estimation of Treatment Effect with Text Covariates

本文发表于2019年IJCAI会议。

Introduction

Treatment effect，是指一个变量（treatment）对另一个变量（outcome）的作用，定义为对treatment进行干预后结果的改变。本文是研究用文本协变量（textual covariates）估计treatment effect。所谓协变量是一个独立变量（解释变量），不为实验所操纵，但仍影响实验结果。不同于数值型协变量，文本型协变量受到的关注更少而且包含更多信息以及层次（word level，topic level，semantics level）。文本数据的这一特性给文本协变量的处理效果估计带来了新的挑战。特别是，一些文本协变量对treatment assignment有很强的预测作用，但对outcome却没有那么大的预测作用，这种协变量被称为nearly instrumental variables（近工具变量）。现有的工作已经表明，对nearly instrumental variables的调节往往会放大因果效应分析中的偏差。因此，在评价treatment effect时应将其排除。然而,当协变量包含文本数据,re-weighting或feature selection based approaches将是有限的,因为这些方法只能被限制在一个特定级别的信息文本中包含的变量,从而不足以总结文本的协变量,进一步导致不足以过滤掉nearly instrumental variables。

Preliminaries

首先我们定义一些符号：

W：treatment。假设数据集中有N个records，$W_i$是第i条records的treatment。
X：non-textual covariate。
T：textual covariate。$T={T_1,T_2,..T_N}$，$T_i$属于第i条记录，$T_i=[t_{i,1},t_{i,2},…,t_{i,N_{t_i}}]$，$t_{i,j}$是第i条记录的第j个单词。
$Y^w_i$：第i个records的用第w个treatment的潜在结果。
$Y^F$：观察结果。
仍然遵循potential outcome framework和假设（Stable Unit Treatment Value Assumption，Consistency，Ignorability，Positivity），以确保治疗效果可以确定。最后，整个问题的输入输出定义如下：
input：非文本协变量X，文本协变量T，treatment assignment W和观测结果$Y^F$
output：ITE（ individual treatment effect），ATE（average treatment effect），ATT（average treatment effect on the treated group）。利用上述假设，他们定义如下：
$ITE_i=E[Y^1|X=x_i]-E[Y^0|X=x_i]=Y^1_i-Y^0_i$
$ATE=E_U[Y^1-Y^0]=\frac{1}{|U|} \sum_{i \in U}(Y^1_i-Y^0_i)$
$ATT=E_{U_1}[Y^1-Y^0]=\frac{1}{|U_1|} \sum_{i \in U_1}(Y^1_i-Y^0_i)$

Methodology

作者提出的 conditional treatment-adversarial learning based matching method (CTAM)的框架如下所示：

如图所示，CTAM包含3个主要组件：text processing, representation learning, and conditional treatment discriminator。通过text processing组件，将原始文本转换为矢量化的表示形式S，然后将S与非文本协变量X连接，构造统一的特征向量，并将其输入表示神经网络，得到潜在表示Z。学习表征后，将Z和可能的结果Y输入conditional treatment discriminator。通过阻止鉴别器分配正确的处理，representation learning可以过滤掉与nearly instrumental variables相关的信息。最后的匹配过程在表示空间Z中执行。

Text Processing and Representation Learning

文本的数值表示是将所有词的词向量进行平均（有点粗糙），得到S。然后将S和数值型协变量X拼接成C，利用一个表示神经网络（实际中，由一个带RELU的 feed-forward neural network来实现）来得到latent representation Z，$Z=\Phi_{rep}(C;\Theta_\Phi)$。然而Z包含nearly instrumental variables的信息，接下来说明如何处理这一问题。

Conditional Treatment Discriminator and Conditional Treatment-Adversarial Learning

Conditional Treatment Discriminator：输入为Z和potential outcome Y，输出为treatment assignment W。在conditional treatment discriminator的作用下，学习后的潜在表示可以消除treatment assignment的conditional dependency。conditional treatment discriminator是一个 feed-forward neural network，定义为$D(Z,Y; \Theta_D)$，其中$\Theta_D$是其参数。目标是正确的预测treatment assignment。损失函数为交叉熵损失，定义为：$L_D(\Theta_D,\Theta_\Phi,\Theta_\Psi)=E_{(c,w)~(C,W)}[-logD(\Phi_{rep}(c;\Theta_\Phi),\Psi_{pop}(\Phi_{rep}(c);\Theta_\Psi);\Theta_D)]$。我们来解释一下这个损失函数，$\Phi_{rep}(c;\Theta_\Phi)$就是$Z_c$，$\Psi_{pop}(\Phi_{rep}(c);\Theta_\Psi)$是pseudo outcome predictor with $\Theta_\Psi$as its parameters。我们将在下一部分介绍这个。
Pseudo Outcome Prediction：conditional treatment discriminator需要所有treatment的potential outcomes，表述如下$Y=\Psi_{pop}(\Phi_{rep}(C);\Theta_\Phi)={\Psi_i(\Phi_{rep}(C);\Theta^{(i)}\Phi)}^{n_w}{i=1}$，$\Psi_i$是每种treatment的预测结果。
Conditional Treatment-Adversarial Learning：conditional treatment-adversarial learning的目标就是过滤掉和nearly instrumental variables相关的信息。由于nearly instrumental variables指的是对treatment assignment比outcomes更有预测性的变量，因此这种过滤策略相当于去除潜在表征和treatment assignment之间的条件依赖性。为此，我们训练了一个对抗学习模型来实现这一目标。判别器D试图最小化损失L来给出正确的treatment，而$\Phi_{rep}$ 和 $\Psi_{pop}$将最大化这个损失来过滤掉这个信息。当 conditional treatment discriminator被成功地欺骗时，nearly instrumental variables相关的信息可以被成功过滤掉。

Loss Function and Parameter Training

loss function：最终的损失为$L=L_p-\lambda L_D$，其中$L_P(\Theta_\Phi,\Theta_\Psi)$它是group distance和pseudo outcome prediction损失的总和，定义如下：

The first term in measures the pairwise distance between the records sharing the observed outcome label under the same treatment, and the second term measures the pairwise distance between the records that have different observed outcomes. Minimizing the difference of two terms makes similar records close to each other, while dissimilar records far from each other in the representation space. The third term is the pseudo outcome prediction loss, and minimizing it allows better potential outcome predictions for conditional treatment discriminator.
Model Training：
Nearest Neighbor Matching

Experiment

Evaluation Metrics

有以下几个评价标准：

PEHE： precision in estimation of heterogeneous effect，$PEHE=\sqrt{\frac{1}{n} \sum^n_{i=1}(ITE_i-ITE’_i)^2}$.
$\epsilon_{ATE}$：error of ATE estimation，$\epsilon_{ATE}=|ATE-ATE’|$
$\epsilon_{ATT}$：error of ATT estimation，$\epsilon_{ATT}=|ATT-ATT’|$
Dataset
News Dataset（Johansson et al.,2016）：The News dataset studies the effect of viewing devices to the user experience.
IHDP Dataset（Brooks-Gunn et al., 1992）：This dataset is from the Infant Health and Development Program targeting low-birth-weight, premature infants.
CFPB Dataset（[Egami et al., 2018）： The CFPB solicits complaints from consumers across a variety of financial products and then addresses those complaints.

Pytorch自定义损失函数
九月 28日, 2019

Pytorch自定义损失函数本文以自己实现交叉熵损失函数为例（不彻底的实现），来简单说说在pytorch中自定义损失函数。首先我们看一下pytorch中的交叉熵损失函数是什么样的，这里我们简单放张图，网上还是有不少说这个的，可自行百度...
End-To-End Memory Networks
八月 16日, 2019

End-To-End Memory Networks—-本文发表于2015年的NIPS会议。人工智能研究面临的两大挑战是，建立能够在回答问题或完成任务时执行多个计算步骤的模型，以及能够描述顺序数据中的长期依赖关系的模型。在这项工作中，...
Graph Neural Networks with Generated Parameters for Relation Extraction
八月 16日, 2019

Graph Neural Networks with Generated Parameters for Relation Extraction—-本文发表于2019年ACL会议。本文提出了基于生成参数的图神经网络模型(GP-GNNs)...
Context-Aware Representations for Knowledge Base Relation Extraction
八月 14日, 2019

Context-Aware Representations for Knowledge Base Relation Extraction—-本文发表于2017年的EMNLP会议。这是一篇关于句子级别实体关系抽取的论文，考虑到传统方法只...
A Hierarchical Framework for Relation Extraction with Reinforcement Learning
七月 22日, 2019

“A Hierarchical Framework for Relation Extraction with Reinforcement Learning”—-本文发表于2019年的AAAI会议，采用两层结构来进行关系抽取。 Hier...
pytorch中LSTM对变长句子的处理
七月 20日, 2019

pytorch中LSTM对变长句子的处理在自然语言处理问题中，以最简单的句子分类为例（比如句子级别情感分类），来说明LSTM对变长句子的处理。（看了网上关于这部分的介绍，大抵互相复制，但感觉都没有说清楚具体怎么用，通过实践，个人总结了...
hexo-markdown使用笔记
七月 12日, 2019

1.hexo的上下标问题：markdown表示下标可以在下标开始和结束处使用‘~’，但是不知道为啥在Typora好用，但在github网页端不好用。亲测一种非常好用的方法，类似于HTML的标签，即在下标开始和结束时使用sub标签（结束...
Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
七月 12日, 2019

‘Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM’—-本文发表于2018年AAAI...
SenticNet5 - Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings
七月 12日, 2019

“SenticNet5:Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings”—-本文发表于2018年AAAI...
Joint entity and relation extraction based on a hybrid neural network
七月 10日, 2019

“Joint entity and relation extraction based on a hybrid neural network ”————-本文发表于2017年的NeuroComputing期刊上。利用神经网络的方法实现...

Please check the comment setting in config.yml of hexo-theme-Annie!

0.0%